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Abstract 

E"' ■ The sequential importance sampling (SIS) algorithm has gained considerable popu- 

i larity for its empirical success. One of its noted applications is to the binary contingency 

1 ^ I tables problem, an important problem in statistics, where the goal is to estimate the 

■ number of 0/1 matrices with prescribed row and column sums. We give a family of 

examples in which the SIS procedure, if run for any subexponential number of trials, 
will underestimate the number of tables by an exponential factor. This result holds for 
CS| ' any of the usual design choices in the SIS algorithm, namely the ordering of the columns 

, and rows. These are apparently the first theoretical results on the efficiency of the SIS 

ly-^ I algorithm for binary contingency tables. Finally, we present experimental evidence that 

' the SIS algorithm is efficient for row and column sums that are regular. Our work is a 
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first step in determining the class of inputs for which SIS is effective. 
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1 Introduction 



Sequential importance sampling is a widely-used approach for randomly sampling from com- 
plex distributions. It has been applied in a variety of fields, such as protein folding [Hj, 
population genetics [3], and signal processing |5j. Binary contingency tables is an applica- 
tion where the virtues of sequential importance sampling have been especially highlighted; 
see Chen et al. 0. This is the subject of this note. Given a set of non-negative row sums 
r = (ri, . . . , Tm) and column sums c = (ci, . . . , c„), let fl = flr,c denote the set of m x n 0/1 
tables with row sums r and column sums c. Our focus is on algorithms for sampling (almost) 
uniformly at random from Q, or estimating \Q\. 

Sequential importance sampling (SIS) has several purported advantages over the more 
classical Markov chain Monte Carlo (MCMC) method, such as: 

Speed: Chen et al. [2] claim that SIS is faster than MCMC algorithms. However, we present 
a simple example where SIS requires exponential (in n,m) time. In contrast, a MCMC 
algorithm was presented in jH which is guaranteed to require at most time polynomial 
in n, m for every input. 

Convergence Diagnostic: One of the difficulties in MCMC algorithms is determining when 
the Markov chain of interest has reached the stationary distribution, in the absence of 
analytical bounds on the mixing time. SIS seemingly avoids such complications since its 
output is guaranteed to be an unbiased estimator of Unfortunately, it is unclear how 
many estimates from SIS are needed before we have a guaranteed close approximation 
of In our example for which SIS requires exponential time, the estimator appears 
to converge, but it converges to a quantity that is off from \VL\ by an exponential factor. 

Before formally stating our results, we detail the sequential importance sampling approach 
for contingency tables, following [2]. The general importance sampling paradigm involves 
sampling from an 'easy' distribution /i over Vt that is, ideally, close to the uniform distribution. 
At every round, the algorithm outputs a table T along with /i(T). Since for any whose 
support is VL we have 



we take many trials of the algorithm and output the average of l/yu(T) as our estimate of \VL\. 
More precisely, let Ti, . . . , Tt denote the outputs from t trials of the SIS algorithm. Our final 
estimate is 



One typically uses a heuristic to determine how many trials t are needed until the estimator 
has converged to the desired quantity. 

The sequential importance sampling algorithm of Chen et al. j2] constructs the table T 
in a column-by-column manner. It is not clear how to order the columns optimally, but this 
will not concern us as our negative results will hold for any ordering of the columns. Suppose 
the procedure is assigning column j. Let r^, . . . , denote the residual row sums after taking 
into account the assignments in the first j — 1 columns. 



E[l|^^{T)] = \Q 





2 



The procedure of Chen et al. chooses column j from the correct probabihty distribution 
conditional on Cj , , . . . , and the number of columns remaining (but ignoring the column 
sums Cj+i, . . . , Cn). This distribution is easy to describe in closed form. We assign column j 
the vector {ti, . . . ,tm) G {0, 1}™, where ^jtj = cj, with probability proportional to 



where n' = n—j + 1. If no valid assignment is possible for the j-th column, then the procedure 
restarts from the beginning with the first column (and sets l//i(Tj) = in for this trial). 
Sampling from the above distribution over assignments for column j can be done efficiently 
by dynamic programming. 

Remark 1. Chen et al. devised a more subtle procedure which guarantees that there will always 
be a suitable assignment of every column. We do not describe this interesting modification of 
the procedure, as the two procedures are equivalent for the input instances which we discuss in 
this paper. 

We now state our negative result. This is a simple family of examples where the SIS 
algorithm will grossly underestimate \ unless the number of trials t is exponentially large. 
Our examples will have the form (1, 1, ... , 1, dr) for row sums and (1, 1, ... , 1, dc) for column 
sums, where the number of rows is m + 1, the number of columns is n + 1, and we require 
that m + dr = n + dc- 

Theorem 2. Let /9 > 0,7 G (0,1) be constants satisfying /? 7^ 7 and consider the input 
instances r = (1, 1, . . . , 1, [/?mj), c = (1, 1, . . . , 1, [7mJ) with m + 1 rows. Fix any order 
of columns (or rows, if sequential importance sampling constructs tables row-by-row) and let 
Xt be the random variable representing the estimate of the SIS procedure after t trials of the 
algorithm. There exist constants si G (0, 1) and S2 > 1 such that for every sufficiently large 
m and for any t < s'!^. 



In contrast, note that there are MCMC algorithms which provably run in time polynomial 
in n and m for any row/column sums. In particular, the algorithm of Jerrum, Sinclair, and 
Vigoda ^ for the permanent of a non-negative matrix yields as a corollary a polynomial time 
algorithm for any row/column sums. More recently, Bezakova, Bhatnagar and Vigoda pQ 
have presented a related simulated annealing algorithm that works directly with binary con- 
tingency tables and has an improved polynomial running time. We note that, in addition to 
being formally asymptotically faster than any exponential time algorithm, a polynomial time 
algorithm has additional theoretical significance in that it (and its analysis) implies non-trivial 
insight into the the structure of the problem. 

Some caveats are in order here. Firstly, the above results imply only that MCMC out- 
performs SIS asymptotically in the worst case; for many inputs, SIS may well be much more 
efficient. Secondly, the rigorous worst case upper bounds on the running time of the above 




(2) 
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MCMC algorithms are still far from practical. Chen et al. % showed several examples where 
SIS outperforms MCMC methods. We present a more systematic experimental study of the 
performance of SIS, focusing on examples where all the row and column sums are identical 
as well as on the "bad" examples from Theorem El Our experiments suggest that SIS is ex- 
tremely fast on the balanced examples, while its performance on the bad examples confirms 
our theoretical analysis. 

We begin in Section Elby presenting a few basic lemmas that are used in the analysis of our 
negative example. In SectionElwe present our main example where SIS is off by an exponential 
factor, thus proving Theorem |21 Finally, in Section |3] we present some experimental results 
for SIS that support our theoretical analysis. 

2 Preliminaries 

We will continue to let /u(T) denote the probability that a table T G Qr,c is generated by 
sequential importance sampling algorithm. We let vr(T) denote the uniform distribution over 
Q, which is the desired distribution. 

Before beginning our main proofs we present two straightforward technical lemmas which 
are used at the end of the proof of the main theorem. The first lemma claims that if a large 
set of binary contingency tables gets a very small probability under SIS, then SIS is likely to 
output an estimate which is not much bigger than the size of the complement of this set, and 
hence very small. Let A = Qj.,c \ A. 

Lemma 3. Let p < 1/2 and let A C fi^.c be such that fi{A) < p. Then for any a > 1, and 

any t, we have 

Pr [Xt < a7r(Z)|fir,c|) > 1 - 2j9t - 1/a. 

Proof. The probability that all t SIS trials are not in A is at least 

(1 - pf > exp{-2pt) > 1 - 2pt, 

where the first inequality follows from ln(l — x) > —2x, valid for < x < 1/2, and the second 
inequality is the standard exp(— x) > 1 — x for x > 0. 

Let Ti, . . . , Tt be the t tables constructed by SIS. Then, with probability > 1 — 2pt, we 
have Tj G A for all i. Notice that for a table T constructed by SIS from A, we have 

Let JF denote the event that Tj G A for alH, 1 < z < t; hence, 

E{Xt I ^) = \A\. 

We can use Markov's inequality to estimate the probability that SIS returns an answer 
which is more than a factor of a worse than the expected value, conditioned on the fact that 
no SIS trial is from A: 

Pr (X > a\A\ I J^) < -. 
^ ^ a 
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Finally, removing the conditioning we get: 



Pr(X<a|^|) > Pr (X < a|^| I J^)Pr(JP) 



> (i-i)(l-2p«) 



1 

> l-2pt--. 

a 



□ 



The second technical lemma shows that if in a row with large sum (linear in m) there 
exists a large number of columns (again linear in m) for which the SIS probability of placing 
a 1 at the corresponding position differs significantly from the correct probability, then in any 
subexponential number of trials the SIS estimator will very likely exponentially underestimate 
the correct answer. 

Lemma 4. Let a < (5 he positive constants. Consider a class of instances of the binary 

contingency tables problem, parameterized by m, with m + 1 row sums, the last of which is 
[f3m\ . Let Ai denote the set of all valid assignments of 0/1 to columns 1, . . . ,i. Suppose that 
there exist constants f < g and a set I of cardinality [amj such that one of the following 
statements is true: 

(i) for every i & I and any A e Ai-i, 



Then there exists a constant bi e (0, 1) such that for any constant 1 < 62 < ^/bi and any 
sufficiently large m, for any t < b^, 



In words, in 6™ trials of sequential importance sampling, with probability at least 1 — 
3(6162)™ the output is a number which is at most a 63^™ fraction of the total number of 
corresponding binary contingency tables. 

Proof. We will analyze case (i); the other case follows from analogous arguments. Consider 
indicator random variables Ui representing the event that the uniform distribution places 1 
in the last row of the i-th column. Similarly, let Vi be the corresponding indicator variable 
for the SIS. The random variable Ui is dependent on Uj for j < i and Vi is dependent on 
Vj for j < i. However, each Ui is stochastically dominated by U- which has value 1 with 



7r(A„+i,j = l|A)</<gr< //(A„+i,j 



1 I A), 



(a) for every i & I and any A e Ai-i, 



/x(A^+i,j = l\A)<f<g< 7r(A„+i,j 



1| A). 
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probability /, and each Vi stochastically dominates the random variable V/ which takes value 
1 with probability g. Moreover, the and V- are respectively i.i.d. 
Now we may use the Chernoff bound. Let k = \_am\ . Then 

_ ^ <expi-ig-fYk/8) 

Vie/ ^ J 

and 

Pr [ kg -y]V:> '-—^k 1 < eM-{9 " ffk/%). 





Let S be the set of all tables which have less than kf + {g — f)k/2 = kg — {g — f)k/2 
ones in the last row of the columns in I. Let bi := exp(— (^f — /)^a/16) G (0, 1). Then 
exp(— (^f — f)'^k/8) < 6™ for m > 1/a. Thus, by the first inequality, under uniform distribu- 
tion over all binary contingency tables the probability of the set S is at least 1 — b^. However, 
by the second inequality, SIS constructs a table from the set S with probabihty at most 6^. 

We are ready to use Lemma El with A = S and p = b"^. Since under uniform distribution 
the probability of S is at least 1 — 6™, we have that \A\ > (1 — &™)|^^r,c|- Let 62 ^ (1, l/^i) 
be any constant and consider t < b"!^ SIS trials. Let a = (6162) ™- Then, by Lemma El with 
probability at least 1 — 2pt — l/a > 1 — 3(6162)™ the SIS procedure outputs a value which is 
at most an a6™ = 6^™ fraction of |i^r,c|- D 



3 Proof of Main Theorem 

In this section we prove Theorem |^ Before we analyze the input instances from Theorem |21 
we first consider the following simpler class of inputs. 



3.1 Row sums (1,1,..., 1, d) and column sums (1,1,..., 1) 

The row sums are (1, . . . , 1, c?) and the number of rows is m+1. The column sums are (1, . . . , 1) 
and the number of columns is n = m + d. We assume that sequential importance sampling 
constructs the tables column-by-column. Note that if SIS constructed the tables row-by-row, 
starting with the row with sum d, then it would in fact output the correct number of tables 
exactly. However, in the next subsection we will use this simplified case as a tool in our 
analysis of the input instances (!,...,!, dr), (!,...,!, d^), for which SIS must necessarily fail 
regardless of whether it works row-by-row or column-by-column, and regardless of the order 
it chooses. 

Lemma 5. Let P > 0, and consider an input of the form (1, . . . , 1, [/3mJ), (1, . . . , 1) with 
m+1 rows. Then there exist constants Si e (0, 1) and S2 > I, such that for any sufficiently 
large m, with probability at least l — 3s^, column-wise sequential importance sampling with 
trials outputs an estimate which is at most a s^™" fraction of the total number of corresponding 
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binary contingency tables. Formally, for any t < s^, 

^r.c I 



The idea for the proof of the lemma is straightforward. By the symmetry of the column 
sums, for large m and d and a G (0, 1) a uniform random table will have about ad ones in 
the first an cells of the last row, with high probability. We will show that for some a G (0, 1) 
and d = (3m, sequential importance sampling is very unlikely to put this many ones in the 
first an columns of the last row. Therefore, since with high probability sequential importance 
sampling will not construct any table from a set that is a large fraction of all legal tables, it 
will likely drastically underestimate the number of tables. 

Before we prove the lemma, let us first compare the column distributions arising from the 
uniform distribution over all binary contingency tables with the SIS distributions. We refer 
to the column distributions induced by the uniform distribution over all tables as the true 
distributions. The true probability of 1 in the first column and last row can be computed as 
the number of tables with 1 at this position divided by the total number of tables. For this 
particular sequence, the total number of tables is Z{m,d) = (^)^! = ('"rf^'^)^') since a table 
is uniquely specified by the positions of ones in the last row and the permutation matrix in 
the remaining rows and corresponding columns. Therefore, 

_^^_ Z{m,d-l) _ rr'rV _ d 
"^''^ ^ Zim,d) {^+^)m\ m + d- 

On the other hand, by the definition of sequential importance sampling, Pr(Aj i = 1) cx 
rj/(n — Tj), where rj is the row sum in the i-th row. Therefore, 



d 

/i(A 



_ n-d _ d{m + d-l) 

m+1,1 -LJ J 1 ,/ , o ' 

+ m— !— dim + d — 1) + m"^ 

n—d n—1 ^ ' 



Observe that if c? «i (5m for some constant /5 > 0, then for sufficiently large m we have 

= 1) > 7r(A„+i,i = 1). 

As we will see, this will be true for a linear number of columns, which turns out to be enough to 
prove that in polynomial time sequential importance sampling exponentially underestimates 
the total number of binary contingency tables with high probability. 

Proof of Lemma\^ We will find a constant a such that for every column i < am we will be 
able to derive an upper bound on the true probability and a lower bound on the SIS probability 
of 1 appearing at the (m + 1, i) position. 

For a partially filled table with columns 1, . . . , z — 1 assigned, let di be the remaining sum 
in the last row and let mi be the number of other rows with remaining row sum 1. Then the 
true probability of 1 in the i-th column and last row can be bounded as 

7r(A^+i,i = 1 I A(„+i)x(i-i)) = — < : =: f{d,m,i), 

mi + di m + d — I 
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while the probabihty under SIS can be bounded as 



di{mi + di - 1) {d-i){m + d-i-l) 



Observe that for fixed m, d, the function / is increasing and the function g is decreasing in i, 
for i < d. 

Recall that we are considering a family of input instances parameterized by m with d = 
\Pm], for a fixed P > 0. We will consider i < am for some a G (0,/5). Let 

f°°{a,(3) := Mm f{d,m,am) = —-^ ; (3) 

m^oo i + p — a 

m^oo p{l + (J) + 1 

A. := ^°°(0, m - /°°(0, /5) = r > 0, (5) 

and recall that for fixed (3, f°° is increasing in a and g°° is decreasing in a, for a < /3. Let 
a < P he such that g°°{a, [3) — f°°{ci, 13) = A/j/2. Such an a exists by continuity and the fact 
that^?-(/?,/5)</-(A/?). 

By the above, for any e > and sufficiently large m, and for any i < am, the true 
probability is upper-bounded by f°°{a,P) + e and the SIS probability is lower-bounded by 
g°°{a,P) — e. For our purposes it is enough to fix e = A^/8. Now we can use Lemma El 
with a and /? defined as above, / = f°°{a, P) + e and g = g°°{a, P) — e (notice that all these 
constants depend only on and / = {1, . . . , [amj}. This finishes the proof of the lemma 
with Si = 6162 and S2 = &2- n 

Remark 6. Notice that every contingency table with row sums (1, 1, . . . , 1, c?) and column 
sums (1, 1, ... , 1) is binary. Thus, this instance proves that the column-based SIS procedure 
for general (non-binary) contingency tables has the same flaw as the binary SIS procedure. We 
expect that the negative example used for Theorem\^also extends to general (i. e., non-binary) 
contingency tables, but the analysis becomes more cumbersome. 



3.2 Proof of Theorem [2] 

Recall that we are working with row sums (1,1,..., 1, dr), where the number of rows is m -|- 1, 
and column sums (1, 1, ... , 1, dc), where the number of columns is n -|- 1 = m -|- 1 -|- ci^ — c^c- 
We will eventually fix d^ = \_Pm\ and d^ = ['~fm\ , but to simplify our expressions we work 
with dr and dc for now. 

The theorem claims that the SIS procedure fails for an arbitrary order of columns with 
high probability. We first analyze the case when the SIS procedure starts with columns of sum 
1; we shall address the issue of arbitrary column order later. As before, under the assumption 
that the first column has sum 1, we compute the probabilities of 1 being in the last row for 
uniform random tables and for SIS respectively. For the true probability, the total number 
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of tables can be computed as (™)(^)("^ — dc)\ + {/^-i) — dc + since a table is 

uniquely determined by the positions of ones in the dc column and dr row and a permutation 
matrix on the remaining rows and columns. Thus we have 



7r(A(.m+i)_i) 



© C) (- -dc)l+ U- J U-J (m - 4 + 1)! 

- dr + 1) + dcdr{dr -!)_/■/ ^ ^ N 

- — : j2['m, dr, dc)', 



n{n — dr + 1) + ndcdr 

(i (n — 1) 

/i(A(^+i),i) = . ""'^^ 1 = -ri TT =: 92{m, dr, 4)- 

+ m^-r drin — 1) + min — dr) 

Let dr = [l3m\ and dc = [7mJ for some constants /? > 0,7 G (0, 1) (notice that this choice 
guarantees that n > dr and m > dc, as required). Then, as m tends to infinity, /2 approaches 

and g2 approaches 



/?(l + /?-7) + l-7' 

Notice that f^iP,-f) = gTiPn) if and only if /3 = 7. Suppose that f^{P,-f) < gTiPn) (the 
opposite case follows analogous arguments and uses the second part of Lemma |3)). As in the 
proof of Lemma |S[ we can define a such that if the importance sampling does not choose the 
column with sum dc in its first am choices, then in any subexponential number of trials it will 
exponentially underestimate the total number of tables with high probability. Formally, we 
derive an upper bound on the true probability of 1 being in the last row of the i-th column, 
and a lower bound on the SIS probability of the same event (both conditioned on the fact 
that the dc column is not among the first i columns assigned). Let dr^ be the current residual 
sum in the last row, rrii be the number of rows with sum 1, and the remaining number of 
columns with sum 1. Notice that rii = n — i + 1, m > rrii > m — i + 1, and 4 > dr^ > dr — i + 1. 
Then 

71"(A(m+i)^j I A(m+i)x(i-l)) 



d^;\n, ~ d^r^^ + 1) + dcd^;\di'^ 



ni{ni — d^r'^ + 1) + riidcdr^ 

^ drjn - dr + 1) + dcdr _. J ^ ^ 

~ [n — i)(n — i — dr) + {n — i)dc{dr — i) ^ r, c, , 

{dr-i){n-i) ( J J ■\ 

- 1 ^ 1 -TT=-gz[m,dr,dc,i). 

drU + m[n — dr) 

As before, notice that if we fix m,dr,dc > satisfying dc < m and dr < n, then /s is an 
increasing function and g^ is a decreasing function in i, for i < min{n — dr, dr}. Recall that 
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n — dr = m — dc- Suppose that i < am < min{m — c/c, dr} for some a which we specify shortly. 
Thus, the upper bound on in this range of i is fsi^m, d^, dc, am) and the fower bound on 
is gsi^m, dr, dc, am). If dr = [Pm\ and dc = ['ym\ , then the upper bound on /s converges to 



f^{a,f3,'y) := hm f3{m,dr,dc,am) 
and the fower bound on converges to 

g^{a,P,-f) := hm g^{m,dr,dc,am) 

Let 



(l + /?-7-a)(/3-a) 



{(3 - a){l + 13 - -f - a) 
/3(l + /3-7) + l-7 



A^,, := ^3-(0, (3, 7) - /3°°(0, (3, 7) = gT{(3, l) - fTiP. l) > 0- 

We set a to satisfy g^{a,P,'-y) — 7) > and a < min{l — 7,/5}. Now we can 

conclude this part of the proof identically to the last paragraph of the proof of Lemma 

It remains to deal with the case when sequential importance sampling picks the dc column 
within the first \_am\ columns. Suppose dc appears as the k-th column. In this case we focus 
on the subtable consisting of the last n + 1 — k columns with sum 1, m' rows with sum 1, and 
one row with sum d', an instance of the form (1,1,..., 1, d'), (1, . . . , 1). We will use arguments 
similar to the proof of Lemma |S1 

First we express d' as a function of m'. We have the bounds (1 — a)m < m' < m and 
d—am < d' < d where d = [/5mJ > /5m— 1. Let d' = (3'm' . Thus, j3—a—l/m < (3' < (3 /{I— a). 

Now we find a' such that for any i < a'm' we will be able to derive an upper bound on the 
true probability and a lower bound on the SIS probability of 1 appearing at position (m' + 1, i) 
of the {n + 1 — k) X m' subtable, no matter how the first k columns were assigned. In order 
to do this, we might need to decrease a - recall that we are free to do so as long as a is a 
constant independent of m. By the derivation in the proof of Lemma El (see expressions Q 
and (jH)), as m' (and thus also m) tends to infinity, the upper bound on the true probability 
approaches 

r{a',(3') = hm --^ < hm ^ , = —-^ =: /r(«, /?,«') 

and the lower bound on the SIS probability approaches 



/?'(1 +/?') + ! -m-- ^(l + ^) + l 

{(3-a-a'){l+p-a-a') ^ {l3-a-a'){l + l3-a-a') ^, ^ 

where the last inequality holds as long as a < 1. Notice that for fixed a, (3 satisfying a < 
min{l, the function is increasing and g^ is decreasing in a', for a' < f3 — a. Similarly, 
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for fixed a',j3 satisfying a' < /3, the function is increasing and is decreasing in a, for 
a < mm{l,(3 — a'}. Therefore, if we take a = a' < min{l, /9/2}, we will have the bounds 

/r(a;,/5,Z/) < /r(a,/?,a) and g'^{x, /3,y) > g'^{a, (3, a) for any x, ?/ < a. 

Recall that A/s = g°°{0, (3) - /°°(0, (5) = g^{0, (3, 0) - /f (0, /?, 0) > 0. If we choose a so that 
gf{a, P, a) — f^{a, P, a) > A/3/2, then in similar fashion to the last paragraph of the proof of 
Lemma we may conclude that the SIS procedure likely fails. More precisely, let e := A/3/8 
and let / := f^{a, P,a) +e and g := g^{a, /3, a) — e be the upper bound (for sufficiently large 
m) on the true probability and the lower bound on the SIS probability of 1 appearing at the 
position (m + 1, i) for i E I := {k + 1, . . . , k + [a'm'J }. Therefore Lemma |3] with parameters 
a(l — a), (3, I of size |/| = [a'm'J > [a{l — a)mj, /, and g implies the statement of the 
theorem. 

Finally, if the SIS procedure constructs the tables row-by-row instead of column-by-column, 
symmetrical arguments hold. This completes the proof of Theorem |21 

□ 



4 Experiments 

We performed several experimental tests which show sequential importance sampling to be a 
promising approach for certain classes of input instances. 

We ran sequential importance sampling algorithm for binary contingency tables, using the 
following stopping heuristic. Let N = n + m. For some e, > we stopped if the last kN 
estimates were all within a (1 -|- e) factor of the current estimate. We set e = 0.01 and k = 5. 

Figure ^a) shows the evolution of the SIS estimate as a function of the number of trials 
on the input with all row and column sums rj = Cj = 5, and 50 x 50 matrices. In our 
simulations we used the more delicate sampling mentioned in Remark ^ which guarantees 
that the assignment in every column is valid, i. e., such an assignment can always be extended 
to a valid table (or, equivalently, that the random variable Xf is always strictly positive). Five 
independent runs are depicted, together with the correct number of tables ~ 1.038 x 10^^^, 
which we computed exactly. To make the figure legible, the y-axis is scaled by a factor of 10^^" 
and it only shows the range from 10 to 10.7. Note that the algorithm appears to converge to 
the correct estimate, and our stopping heuristic appears to capture this behavior. 

In contrast. Figure ^b) depicts the SIS evolution on the negative example from Theorem 
El with m = 300, (3 = 0.6 and 7 = 0.8, i. e., the input is (1, . . . , 1, 179), (1, . . . , 1, 240) on a 
301 X 240 matrix. In this case the correct number of tables is 

We ran the SIS algorithm under three different settings: first, we constructed the tables 
column-by-column where the columns were ordered from the largest sum, as suggested in 
the paper by Chen et al. (the red curves correspond to three independent runs with this 
setting); second, we ordered the columns from the smallest sum (the green curves); and 
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third, we constructed the tables row-by-row where the rows were ordered from the largest 
sum (the blue curves). The y-axis is on a logarithmic scale (base 10) and one unit on the 
X-axis corresponds to 1000 SIS trials. We ran the SIS estimates for twice the number of trials 
determined by our stopping heuristic to indicate that the unfavorable performance of the SIS 
estimator on this example is not the result of a poor choice of stopping heuristic. Notice that 
even the best estimator differs from the true value by about a factor of 40, while the blue 
curves are off by more than a factor of 1000. 

Figure El represents the number of trials required by the SIS procedure (computed by our 
stopping heuristic) on several examples for n x n matrices. The four curves correspond to 5, 
10, [SlognJ and [n/2j-regular row and column sums. The x-axis represents n, the number 
of rows and columns, and the y-axis captures the required number of SIS trials. For each n 
and each of these row and column sums, we took 20 independent runs and we plotted the 
median number of trials. For comparison, in Figure 01 we plotted the estimated running time 
for our bad example from Theorem |21 (recall that this is likely the running time needed to 
converge to a wrong value!) for n + m ranging from 20 to 140 and various settings of (3,'y: 
0.1,0.5 (red), 0.5,0.5 (blue), 0.2,0.8 (green), and 0.6,0.8 (black). In this case it is clear that 
the convergence time is considerably slower compared with the examples in Figure El 
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Figure 1: The estimate produced by sequential importance sampling as a function of the 
number of trials on two different instances. In both figures, the horizontal line shows the 
correct number of corresponding binary contingency tables, (a) The left instance is a 50 x 
50 matrix where all rj = Cj = 5. The x-axis is the number of SIS trials, and the y-axis 
corresponds to the estimate scaled down by a factor of 10^^°. Five independent runs of 
sequential importance sampling are depicted. Notice that the y-axis ranges from 10 to 10.7, 
a relatively small interval, thus it appears SIS converges to the correct estimate, (b) The 
input instance is from Theorem |21 with m = 300, (3 = 0.6 and 7 = 0.7. The estimate (y-axis) 
is plotted on a logarithmic scale (base 10) and one unit on the x-axis corresponds to 1000 
SIS trials. Note that in this instance SIS appears to converge to an incorrect estimate. Nine 
independent runs of the SIS algorithm are shown: the red curves construct tables column- 
by-column with columns sorted by decreasing sum, the blue curves construct row-by-row 
with rows sorted by decreasing sum, and the green curves construct column-by-column with 
columns sorted increasingly. 
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Figure 2: The number of SIS trials before the algorithm converges, as a function of the input 
size. The curves correspond to 5 (red), 10 (blue), [51ognJ (green), and [n/2\ (black) regular 
row and column sums. 
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Figure 3: The number of SIS trials until the algorithm converges as a function of m + ra. The 
inputs are of the type described in TheoremEl with /3 = 0.1, 7 = 0.5 (red), /3 = 7 = 0.5 (blue), 
P = 0.2,7 = 0.8 (green), and (3 = 0.6,7 = 0.8 (black). The right plot shows the same four 
curves with the number of SIS trials plotted on a logarithmic scale. Note that the algorithm 
appears to be converging in sub-exponential time. Recall from Figure ^ that it is converging 
to the wrong estimate. 
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